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A MIXTURE REPRESENTATION OF tt WITH APPLICATIONS IN 
MARKOV CHAIN MONTE CARLO AND PERFECT SAMPLING 

By James P. Hobert^ and Christian P. Robert^ 
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Let X = {Xn ; n = 0, 1, 2, . . .} be an irreducible, positive recurrent 
Markov chain with invariant probability measure tt. We show that if 
X satisfies a one-step minorization condition, then tt can be repre- 
sented as an infinite mixture. The distributions in the mixture are 
associated with the hitting times on an accessible atom introduced via 
the splitting construction of Athreya and Ney [Trans. Amer. Math. 
Soc. 245 (1978) 493-501] and Nummelin [Z. Wahrsch. Verw. Gebiete 
43 (1978) 309-318]. When the small set in the minorization condition 
is the entire state space, our mixture representation of tt reduces to a 
simple formula, first derived by Breyer and Roberts [Methodol. Corn- 
put. Appl. Probab. 3 (2001) 161-177] from which samples can be easily 
drawn. Despite the fact that the derivation of this formula involves no 
coupling or backward simulation arguments, the formula can be used 
to reconstruct perfect sampling algorithms based on coupling from 
the past (CFTP) such as Murdoch and Green's [Scand. J. Statist. 25 
(1998) 483-502] Muhigamma Coupler and Wilson's [Random Struc- 
tures Algorithms 16 (2000) 85-113] Read-Once CFTP algorithm. In 
the general case where the state space is not necessarily 1-small, un- 
der the assumption that X satisfies a geometric drift condition, our 
mixture representation can be used to construct an arbitrarily accu- 
rate approximation to n from which it is straightforward to sample. 
One potential application of this approximation is as a starting dis- 
tribution for a Markov chain Monte Carlo algorithm based on X. 
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1. Representing tt as a mixture. Let P{x, dy) be a Markov transition 
kernel on a general state space (X,;B(X)) and write the associated discrete 
time Markov chain as X = {X„ : n = 0, 1, 2, . . . }. For t G N := {1, 2, 3, . . . }, 
let P^{x,dy) denote the t-step Markov transition kernel corresponding to 
P. Then for n € N, x € X and a measurable set A, P^{x,A) = Pr(Xt+„ G 
= x). Throughout the paper we assume that X is vr-irreducible and 
positive Harris recurrent where tt is the invariant probability measure; for 
definitions see Meyn and Tweedie [(1993), Part I]. For an arbitrary measure 
jj, and function g, we use the usual notation A) = P*(x, A)fj.{dx) and 
f^i9) = Ix9{x)^J■{dx)■ 
Tlie assumptions we have made guarantee the existence of an m G N, a 
probability measure v on B{X), a small set C with ■7r(C) > and an e > 
such that for any x G C, 

P'^{x,A)>eu{A) V^Ge(X). 

For ease of exposition, we consider only the strongly aperiodic case in which 
m= 1; that is, we assume X satisfies a one-step minorization condition 

(1) P{x,-)>ev{-) VxGC. 

A minorization condition allows for the celebrated splitting construction of 
Athreya and Ney (1978) and Nummelin (1978, 1984). To be specific, if x G C, 
we can use (1) to write P{x,-) as a two-component mixture 

(2) p^x,-) = ei^{-) + {l-e)R{x,-), 

where R{x, •) = (1 — e)~^(P(x, ■) — ev{-)) is called the residual measure; define 
R{x,-) to be if e = 1. 

If Xn G C, then (2) can be used to generate Xn+i sequentially as fol- 
lows. Draw 6n ~ Ber(e). If 6n = 1, then take Xn+i from ;/(•), else take 
Xn+i from R{Xn,-). This can be formalized by introducing the split chain, 
X' = {{Xn,Sn) :n = 0,1, . . .}, which lives on the space X x {0, 1} and has 
Markov transition kernel given by 

P'\(ra\ ,^r.,.f[eS + {l-e){l-5)]P{x,A), if x ^ C, 

^ ii^. Uj, A X |0|j - j ^ _ ^^^^ _ ^^^^^^^ if x G C, 

and 

f [(X, ij, ^ X |d|J - I ^ _ ^^^^ _ if X G C, 

where 5 G {0, 1}. It is clear that, marginally, the sequence {X^ : n = 0, 1, . . . } 
from the split chain is equivalent (in distribution) to X. Also, the measure 
tt' defined by 

^'{A X {5}) = ^{A)[e5 + (1 - e)(l - 5)] 
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is invariant for X' . The key to our argument is that the set a:=C x {1} is 
an accessible atom [Meyn and Tweedie (1993), page 100] and the (random) 
times at which X' enters a are regeneration times. Define Tq, to be the first 
return time to a; that is, 

Ta = min{n > 1 : (X„, 5„) G a}. 

Also, let PrQ,(-) and Eq,(-) denote probability and expectation conditional on 
{Xq,5q) € a; that is, conditional on Xi ^ i^(-)- It follows from Kac's theorem 
[Meyn and Tweedie (1993), Theorem 10.2.2] that Ea(ra) < oo. Hence, we 
may define a nonnegative, nonincreasing sequence {pt}'^i that sums to one 
by putting 

(3) Pt ■ 



EcCTq) 

Now, for any t E N and any measurable set A, define 
(4) Qt(A)=Pr„(XtGA|r„>t); 

that is, Qt is the conditional distribution of Xt given that {Xq,5q) G a and 
that there are no regenerations in the split chain before time t. We now state 
the first of our two main results. 



Theorem 1. Let P he a Markov transition kernel on a general state 
space (X,S(X)). Assume that the associated Markov chain, X, is tt -irreducible 
and positive Harris recurrent where vr is the invariant probability measure. 
Assume further that the minorization condition (1) holds. Then for any 
A G B(X), we have 

oo 

(5) 7r{A) = ^Qt{A)pt, 

t=i 

where pt and Qt{-) are defined at (3) and (4), respectively. 

Proof. Applying Meyn and Tweedie's (1993) Theorem 10.2.1 to X' 
and using the fact that Tr'{A x {0, 1}) = vr(^), we have 

oo oo 

niA) = ^^y^Pr^{Xt €A,Ta> t) = VPr,(Xt eA\Ta>t)pt. ^ 

Representation (5) is appealing from a simulation point of view because 
it reveals the potential for drawing from vr by randomly drawing an element 
from the set {Qi,Q2,Q3, ■ ■ ■} according to the probabilities pi,P2,P3, ■ ■ ■ and 
then making an independent random draw from the chosen Qt. This idea is 
closely related to perfect sampling [Fill (1998) and Propp and Wilson (1996)], 



4 



J. P. ROBERT AND C. P. ROBERT 



which is a simulation method wherein a Markov chain with stationary distri- 
bution vr is used to produce independent and identicahy distributed (i.i.d.) 
samples from vr. In fact, it is shown in Section 2 that when C = X, there are 
direct connections between (5) and perfect sampling. We show in Section 3 
that if X also satisfies a drift condition, our mixture representation can be 
used to construct an arbitrarily accurate approximation of vr from which it 
is easy to sample. Finally, in Section 4 we explain how this approximation 
to TT provides a new method of dealing with the burn-in problem in Markov 
chain Monte Carlo (MCMC). 

2. The case where C = X: perfect sampling. Consider the special case 
in which C = X, which of course implies that X is 1-uniformly ergodic. In 
this case, Xn € C for all n and, hence, Tq ~ Geo(e); that is, Vvct{Ta = t) = 
e(l — eY~^ for t G N. Plugging into (3) yields pt = e(l — so the pts are 
also geometric probabilities. The distribution Qt is also quite simple when 
C = X. Indeed, if there were no regenerations in the split chain before time 
t, this means that, after having drawn Xi ~ u, the residual measure, R, was 
applied t — 1 consecutive times to get Xt. We state this as a corollary. 

Corollary 1. Under the assumptions of Theorem 1, if C = X, then 

oo 

(6) niA) = J2eil-ey-'R'~\u,A), 

t=i 

where R^{i',A) is defined as i^{A). 

It is possible to derive (6) directly without using (5). Indeed, consider 
a Markov transition kernel, M, on the general space (Z,0(Z)) that takes 
the form M{z, •) = ujfi{-) + (1 — uj)K{z, •), where K{z, •) is a ?/;-irreducible 
Markov transition kernel on the same space, //(•) is a probability measure 
on B{Z) and to € (0, 1). Theorem 2 of Breyer and Roberts (2001) shows that 
^j^iuj{l — ujy~^ K^~^{n, ■) is an invariant probability measure for M. Now 
note that when (1) holds with C = X, then P{x, •) = ei'{-) + (1 — e)R{x, •) 
for all X G X and it follows that vr can be written in the form (6). (It is 
interesting to note that M is positive recurrent even if K is badly behaved, 
e.g., transient.) 

Corollary 1 immediately yields the following algorithm for sampling from 

tt: 

Algorithm I. 

1. Simulate xi ~ z/(-) and, independently, t ~ Geo(e). 

2. If t = 1, take xi, else simulate the transition Xn+i ~ R{xn,-) n = 
1,. . . ,t — 1 and take xt- 
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Algorithm I is exactly the Multigamma Coupler of Murdoch and Green 
[(1998), page 486] which is a perfect sampling algorithm based on coupling 
from the past (CFTP) [Propp and Wilson (1996)]. Note that we have used 
our mixture representation of vr to derive this algorithm with no appeal to 
coupling or backward simulation. 

Breyer and Roberts (2001) show, in the context of their catalytic perfect 
simulation algorithm, that Corollary 1 can also be used to derive Wilson's 
(2000) Read-Once CFTP algorithm. We now give a slightly different and 
more detailed description of this connection which culminates in a state- 
ment of the algorithm that Wilson described on page 93 of his paper [Wilson 
(2000)]. We begin with a Markov transition kernel, S, on the general state 
space (X, B{X)), such that the associated Markov chain, Y = {Yq,Yi, . . . }, is 
vr-irreducible and positive Harris recurrent where vr is the invariant proba- 
bility measure. Let g' : X x (0, 1) — > X be a function such that if ?7 ~ Uni(0, 1), 
then for any a; € X and any measurable A, 

PT[g{x,U)eA]=S{x,A). 

Now for n G N let G„ : X x (0, 1)" ^ X be defined through compositions of g 
as follows: 

Gn{x,Ui, ...,Un) =g{g{- ■■g{x,Ui) ■ ■ ■ ,Un-l),Un). 

For example, G'3(x, ui, M2, us) = g{g{g{x,ui),U2),U3). Clearly, if Ui,...,Un 
are i.i.d. Uni(0, 1), then Gn{x, Ui, . . . , Un) has distribution 5"(x, •). If Gn{x, ui, 
is constant in x for some fixed (ni, . . . , m„), we call Gn{x, ui, . . . , ii„) a coa- 
lescent composite map. 

Remark 1. Wilson's setup is actually a bit more abstract than ours. 
First, he does not assume as much as we do about the structure of the 
"random function" g. Second, Wilson assumes that the user possesses an 
efficient, but imperfect method for checking whether Gn{x, ui, . . . ,Un) is co- 
alescent. This method will never incorrectly conclude coalescence, but may 
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Fig. 1. Probability transition diagram for a Markov chain onX— {1,2,3}. 
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miss the fact that a particular ui, . . . , n„) is coalescent. When this 

imperfect method concludes that Gn{x, ui, . . . , Un) is constant in x for some 
fixed (ui, . . . , Un), then Gn{x, ui, . . . , u„) is called "officially coalescent." 

Suppose that A; S N is such that when Ui, . . . ,Uk are i.i.d. Uni(0, 1), we 
have 

(7) Pr[Gfc(x, Ui,. . .,Uk) is coalescent] =e> 0. 

As an example, consider the Markov chain on X = {1,2,3} whose evolution 
is described by the probability transition diagram given in Figure 1. For this 
chain, we could take g as follows: 

gC^^u) = 2, g(.2,u) = 2/(0,1/2) (-u) + 3/[i/2,i)(u) 

and 

5^(3, u) = 7(0,2/3) (u) + 2/[2/3,l) (u) . 

It is not difficult to see that Pr[G'i(x, Ui) is coalescent] = 0, but Pr[G2(x, Ui, U2) 
is coalescent] = 1/4. 

Now if we set P = S^, then for any measurable set A, 

P{x,A)=Fr[Gk(,x,Ui,...,Uk)eA] 

= ePr[Gfc(x, Ui, . . . , Uk) G A\Gk is coalescent] 

(8) 

+ (1 — e) Pr[Gfc(x, Ui, . . . , Uk) € A\Gk is not coalescent] 
= eL'{A) + {l-e)R{x,A), 

where we have defined and R{x, ■) in an obvious way. Drawing from z/(-) 
is quite simple — ^just simulate i.i.d. copies of {Ui, . . . , Uk) until the observed 
value of Gk{x,Ui, . . . ,Uk) is coalescent. Drawing from R{x,-) can be done 
similarly by waiting for the first noncoalescent value of Gk{x,Ui, . . . ,Uk)- 
Therefore, if e is known, Algorithm I can be applied to make draws from vr. 

The beauty of (8), however, is that it can be used to simulate from vr 
even when e is unknown! Indeed, assume that (7) holds, but that the exact 
value of £ is unknown. Note that application of Algorithm I does not require 
knowledge of e, it only requires the ability to simulate from the Geo(e) dis- 
tribution. But simulating Z ~ Geo(e) is easy — ^just generate i.i.d. copies of 
(C/i, . . . , Uk) and let Z be the number of trials required until the first coales- 
cent Gk{x,Ui, . . . ,Uk) is observed. Moreover, the byproducts of simulating 
Z are Z — 1 i.i.d. noncoalescent Gk{x,Ui, . . . , Uk)s and an independent coa- 
lescent Gk {x,Ui, . . . ,Uk)- All of this is formalized in the following algorithm, 
which describes how to use (8) to simulate from vr. 



Algorithm II. 
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1. Simulate a coalescent Gk{x, Ui, . . . ,Uk)- Call its value s. 

2. Draw independent copies of (Ui, . . . ,17^) (where the components are k 
i.i.d. uniforms) until the observed value of Gk{x, Ui, . . . , Uk) is coalescent. 
Let t denote the number of trials required and write the t observed values 
of {Ui,...,Uk) as {ui^i,.. .,Ui^k), i = l,...,t. 

3. Take 

Gfc(- ■ ■Gk{s,ut-i,i, . . .,ut-i,k) ■ ■ ■,ui,i, . . . ,Mi,fc). 

Algorithm II is exactly Wilson's (2000) Read-Once CFTP algorithm. It 
lends itself to iteration. Indeed, Gk{x,ut^i, ■ ■ ■ jUt^^) is coalescent and is not 
used at step 3. Thus, it can be used at step 1 of the next iteration of the 
algorithm. 

We end this section with an interesting interpretation of Corollary 1. 
Because C = X, we can assume that all transitions of X are made using (2). 
Now, each time a regeneration occurs, that is, each time a draw is made 
from we have to wait a Geo(e) number of iterations before the next 
draw from z/. And, of course, the residual measure, R, is used in between. 
Thus, what Algorithm I is actually doing is returning the states immediately 
prior to the draws from v. Hence, an intuitive way to state Corollary 1 is as 
follows: The states of the Markov chain immediately prior to regenerations 
have distribution vr. Wilson (2000) attempts to connect this to the PASTA 
(Poisson Arrivals See Time Averages) phenomenon from the continuous time 
literature. 

3. The case where C 7^ X: approximating tt. While things are more 
difficult when C 7^ X, it is still possible to make draws from the distribution 
Qt using a simple accept-reject algorithm. All that is required is the ability 
to simulate the split chain. Note that Qi{-) = i^(-)) so in the algorithm, it is 
assumed that t>2. 

Algorithm III. 

1. Take (xo,5o) £ and simulate the split chain for t iterations. 

2. If, for each i = 1, 2, . . . , t — 1, {xi,Si) ^ a, then take xt] otherwise, repeat. 

Jones and Robert (2001) provide some practical advice on simulating the 
split chain. 

Let T denote a discrete random variable with support N such that Pr(r = 
t) = pt. The ability to randomly draw an element from the set {Qi,Q2,Q3, ■ ■ ■} 
according to the probabilities pi,p2,P3, . ■ . is tantamount to being able to 
simulate T. While T ~ Geo(e) when C = X, its distribution is quite compli- 
cated when C 7^ X. On the other hand, making i.i.d. draws from the distri- 
bution of Tq, is straightforward — ^just take Xi ~ run the split chain, and 
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count the number of iterations until the first regeneration. Unfortunately, 
despite the simple relationship between their mass functions, it is not clear 
how to use i.i.d. draws from the distribution of to get i.i.d. draws from 
the distribution of T. Hence, we focus on using (5) to construct an approx- 
imation to vr. 

Let {ptjj^i denote another nonnegative sequence that sums to one and let 
T denote the corresponding discrete random variable, that is, Pr(T = t) =Pt- 
Consider an approximation of ir given by 



(9) 



Note that 



H-)=j:Qt{-)Pt 



t=i 



vr 



■vr 



pt 



t=i 



t=i 



■pt\- 



t=i 



Thus, the total variation distance between the distributions vr and vr is 
bounded above by twice the total variation distance between the distri- 
butions of T and T. 

We now show that, under an additional assumption on X , for any given 
7 > 0, it is possible to construct a sequence {pt}tZi such that J2u=i \Pt — 
Pj| < 7 and such that making i.i.d. draws from the distribution of T is 
straightforward. The assumption is that the Markov chain X satisfies a 
geometric drift condition, that is, for some function 1/:X— > [l,oo), some 
A < 1 and some b < oo, we have 



(10) 



PV{x)<XV{x) + bIc{x) VxgX, 



where PV{x) := Jy_V{y) P{x,dy). It is well known that (10) combined with 
the smallness of the set C implies that the Markov chain X is geometrically 
ergodic [Meyn and Tweedie (1993), Chapter 15]. We will need the following 
result, which is a combination of Theorems 2.3 and 4.1 in Roberts and 
Tweedie (1999). 



Theorem 2 [Roberts and Tweedie (1999)]. Let P be a Markov transi- 
tion kernel on a general state space (X,S(X)). Assume that the associated 
Markov chain, X , is tt -irreducible and positive Harris recurrent where vr is 
the invariant probability measure. Assume further that the minorization con- 
dition (1) and the drift condition (10) both hold. Define d = sup^^(j V {x) , 
A = supx^c PV {x) and J = {A — e)/X. Then the generating function Ea{P^°') 
converges for /? G (1,/?*) where: 

1. If J <l, then p* = X-\ 
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2. If J>1, then 



j3* = exp 



logAlog(l -e) 



< A' 



. log J - log(l - e) 
Furthermore, letting cl){[i) = log [3 / log , if (3 e {1, (3*), then 



(11) 



Pr^{Ta>t)<l3[„{V)] 
=:g{f3,e,J)r'- 



1 -/?(!-£) 



l-{l-e){J/{l-£))m 



Remark 2. In applications I'iV) may be difficult to calculate. Fortu- 
nately, there is a simple upper bound. Indeed, an application of Lemma 1 
from Hobert, Jones, Presnell and Rosenthal (2002) yields iy{V) < Tr{V)/[£Tr{C)]. 
From (10) we know that 7r(F)/7r(C) < 6/(1 -A) and, hence, i/(F) < 6/[e(l- 
A)]. 

We are now in a position to state the second of our two main results. 

Theorem 3. Assume the hypotheses of Theorem 2. Fix 7 > and (3 G 
(l,/3*). Let T he the random variable supported on {1,...,M} with proba- 
bilities 



Pt 



Pra{Ta>t) 



where M is any integer larger than 

-2gij3,e,J) 



log 



7(/3-l) 



log p. 



Then ||vr(-) — '7r(-)|| < 7- 



Proof. First, 

00 M 



t=i 



t=i 



Pra(ra>t) 



Pr„(T„>t) 



2 E 



Ea(Ta) EfilPra(ra>s) 
Fra{Ta>t) 



t=M+l 



Ea(TQ 



+ E 

t=M+l 



Pra(ra >t) 
Ea(ra) 



Thus, since Eq,(tq) > 1, it suffices to show that J2'^M+i^^a{Ta ^t) < 7/2. 
Using (11) from Theorem 2, we have 

E Pr.(r.>t)<5(/3,£,J) E = ^^f^/?"*' 



t=M+l 



t=M+l 



(5-1 



10 



J. p. HOBERT AND C. P. ROBERT 



and the result follows from the assumption on M. □ 

Of course, Theorem 3 is useful from a practical standpoint only if it is 
possible to sample from the distribution of T. To this end, consider the 
random vector (F, 1^), where V and W are independent, V is uniform on 
{1,...,M}, and W is equal in distribution to Tq. when [Xq^5q) G a. Note 
that, for any t € {1, . . . , M}, we have 

Hence, the following algorithm can be used to sample from the distribution 
of T. 



Algorithm IV. 

1. Draw V ~ Uni{l, . . . , M} and, independently, draw w from the distribu- 
tion of Ta with {xo,5o) £ a. 

2. U w > V, take v; otherwise, repeat. 

We conclude that, given any 7 > 0, Algorithms III and IV can be used to 
make i.i.d. draws from vf satisfying |l7r(-) — 7t{-)\\ < 7. In the last section we 
briefly describe how our approximation may provide an alternative solution 
to the burn-in problem in MCMC. 

4. An application to burn-in. Suppose that the Markov kernel, P, is the 
basis of an MCMC algorithm whose purpose is to explore vr. Our assumptions 
about P imply that for every initial probability measure /i(-) we have 

||P"(^f,-)-^(-)IUO asn^oo. 

Typically, the MCMC user has no particular starting distribution in mind. 
Indeed, /i(-) is usually taken to be a point mass at some point from which it is 
convenient to start the simulation. An important problem in the implemen- 
tation of MCMC algorithms is burn-in (time), which is formally described 
as follows. Given ^(•) and 7 > 0, we want to find a value n* such that 

(12) ||P"*(m,-)-^(-)II<7. 

If (12) holds, then the marginal distribution of A„ (conditional on Xq ~ fi) 
is within 7 of vr for all n>n*. Hence, n* may be regarded as a reasonable 
time to start sampling the Markov chain. 

Several authors have recently shown that drift and minorization condi- 
tions on the Markov chain can be used to derive computable upper bounds 
on ||P"(^,-) — 7r(-)|| that decrease geometrically fast in n [Douc, Moulines 
and Rosenthal (2004), Meyn and Tweedie (1994), Roberts and Tweddie 
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(1999) and Rosenthal (1995)]. These upper bounds can be used to find an 
n* that satisfies (12). Unfortunately, when this strategy is used for nontoy 
MCMC algorithms, it is not unusual for the resulting n*s to be too large to 
be of any practical value [see, e.g., Jones and Hobert (2004)]. 

Alternatively, a seemingly unnatural way to phrase the burn-in question 
is as follows. Can we find a starting distribution, /Li(-), that is within 7 of vr 
in total variation? If so, we could start sampling the chain immediately. This 
seems unnatural because the stationary distribution of an MCMC algorithm 
is typically intractable and, hence, not easily approximated. Nevertheless, 
the results in the previous section show that we can actually construct such 
a starting distribution. An alternative method of dealing with the burn-in 
problem is to start the chain by drawing Xq ~ vf and using all the samples 
right from the start. 

Of course, tt would normally be constructed using the same drift and 
minorization conditions that are used to construct the upper bounds men- 
tioned above. One might suspect that in situations where the n*s calculated 
using the upper bounds are too large, simulating from vf might be extremely 
inefficient, perhaps to the point where it is not practical. On the other hand, 
TT was derived without using several inequalities that are required in deriving 
the upper bounds. For example, we did not use the coupling inequality, nor 
did we have to worry about constructing a hivariate drift condition using 
the drift on the original chain [see Roberts and Tweedie (1999), Theorem 
5.2]. 

Acknowledgments. The authors are grateful to Laird Breyer, Galin Jones, 
Eric Moulines, Gareth Roberts and Richard Tweedie for helpful discussions, 
and to an anonymous referee for constructive comments and suggestions. 
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